Creating an R Package
Creating an R Package
1 Introduction
One of the fundamental roles of a statistician is to create methods to analyze data. This typically involves four components: developing the theory, translating the equations to computer code, a simulation study and a real data analysis. While these are enough to get published, it is unlikely your method will be used by others without a key fifth component: a software package. A package is a collection of reusable functions, the documentation that describes how to use them, tests and sample data. They provide a structured way to organize, use and distribute code to others and/or your future self. The objective of this workshop is to learn how to develop an R package. In addition to creating an R package from scratch, you will learn how to make it robust across platforms and future changes using continuous integration and unit testing. This workshop assumes familiarity with R, RStudio, writing functions, installing packages, loading libraries and requires a GitHub account. This will be an interactive workshop.
2 Pre-workshop set-up
You must bring your own laptop. It is vital that you attempt to set up your system in advance. You cannot show up at the workshop with no preparation and keep up!
- R (version ≥ 3.6.0)
- RStudio (version ≥ 1.2.1335). This is a powerful graphical user interface (GUI) which makes the package creation process much easier.
- Git. I strongly recommend reading these setup instructions by Jenny Bryan for Mac/Windows/Linux and the Troubleshooting section.
- Please read Chapter 1: Why Git? Why GitHub? to understand the big picture and motivation for using Git and Github.
- Sign up for a GitHub account. We will use GitHub to host the source files of our
Rpackage. I also recommend reading Jenny Bryan’s advice on carefully choosing a username.
- GitKraken. This is a GUI for Git which makes it much easier to dive into version control without the command line. GitKraken is to Git what RStudio is to R. This is optional but highly recommended, particularly for new Git users. You are free to use the GUI of your choice or simply the command line. In this workshop I will be using GitKraken.
- Complete Section 3 of this tutorial.
- Run the following commands in
R:
install.packages("pacman")
# this command checks if you have the packages already installed,
# then installs the missing packages, then loads the libraries
pacman::p_load(knitr, rmarkdown, devtools, roxygen2, usethis)
# identify yourself to Git with the usethis package
# use the exact same username and email associated
# with your GitHub account
usethis::use_git_config(user.name = "gauss", user.email = "gauss@normal.org")
3 Git and GitHub
3.1 Introduction
This section walks you through the process of creating a GitHub repository (abbreviated as repo), creating a local copy of the repo (i.e. on your laptop), making some changes locally and updating your changes on the remote (aka GitHub repo). It assumes that you have successfully completed the requirements outlined in Section 2. The following figure summarizes some key terminology that we will make use of in this section:
Figure 3.1: source: http://ohi-science.org/data-science-training/
3.2 Annotations
For each step, I have provided screenshots annotated with red rectangles, circles and arrows. You can click on each image to enlarge it. The following table describes what each of the annotations represent.
| Annotation | Description |
|---|---|
| Enter text or fill in the blank | |
| Click on the circled button | |
| Take note of. No action is required. |
3.3 Step 1: Create a remote repo
We first create a GitHub repo. Head over to https://github.com and login. Then click on new repository:
Give it a name. It can be anything you want (just pick a name that will remind you that this repository contains the source files of your R package). In the screenshots below I used rpkg throughout. Click on Create repository:
Copy the link of your newly created repo to your clipboard:
3.4 Step 2: New RStudio Project via git clone
Create a local copy of the remote repo using RStudio projects:
Click on Version Control:
Click on Git. Note that if you get an error or you don’t see this option, this likely means that your RStudio doesn’t know where to find your local Git installation. Please see Chapter 13: Detect Git from RStudio for troubleshooting this.
Paste the link to your remote repo in the Repository URL box, name the folder that will contain your R package files, and browse to where you want the folder to be saved in your filesystem. Click on Create Project:
Your RStudio window should open a new project in the specfied directory. Take note of the following points annotated in the screenshot below:
- The
Gittab allows you to useGitand push toGitHubwithinRStudio. You will see any changes that have been made to your files since the last commit here. I have found theRStudiointerface toGitto be inadequate and slow. I just want you to be aware of this functionality. I only look at this tab to quickly see if there were any changes, but do all my version controlling and interfacing withGitHubusingGitKraken. - Shows the path of your working directory, which is set to the root of your
RStudioproject by default. You can always click on the arrow to return to the working directory. - Indicates the name of your
Rstudioproject. It’s also a dropdown menu for other recently openedRStudioprojects. - Filesystem viewer of your working directory. You should see a
.gitignorefile and theRStudioproject file. These were automatically added byRStudiowhen you created a new project from aGitHubrepo. - A dropdown menu with extended
Gitfunctionalities.
3.5 Step 3: add and commit your changes
The following figure shows the commands needed for a basic version controlled workflow. Refer back to this figure once you complete Step 3 and then once again when you complete Step 4 (it should make a little more sense).
Figure 3.2: source: https://www.edureka.co/blog/git-tutorial/
Have you ever versioned a file by adding your initials or the date? That is effectively a commit, albeit only for a single file: it is a version that is significant to you and that you might want to inspect or revert to later (Bryan, STAT545TAs, and Hester 2019). The commit command is used to save your changes to the local repository. From the Git tab, click on Commit:
Note that you have to explicitly tell Git which changes you want to include in a commit before running the git commit command. This means that a file won’t be automatically included in the next commit just because it was changed. Instead, you need to use the git add command to mark the desired changes for inclusion. Instead of typing git add in the terminal, you can simply click the boxes next to the files you want to add (this is also referred to as staging a file). The lines 1 to 4 highlighted in green refer to the contents of the .gitignore file and the green highlight indicates they are being added to the file (red highlight indicates removal of a line):
Every time you make a commit you must also write a short commit message. Ideally, this conveys the motivation for the change. Remember, the diff will show the content. When you revisit a project after a break or need to digest recent changes made by a colleague, looking at the history, by reading commit messages and skimming through diffs, is an extremely efficient way to get up to speed (Bryan, STAT545TAs, and Hester 2019). Enter a commit message and click on the Commit button:
If everything worked, you should see the following screen with the commit message and the files that were added:
3.6 Step 4: push your local commits
The push command is used to publish new local commits on a remote server (the remote repo you created in Step 1):
Enter your username:
and your password:
Note the following:
- The URL of the remote repo.
- The name of the local branch called
master. (We’ll talk about branches later). - The name of the remote branch also called
master.master -> masterindicates that you have pushed the commit from the localmasterbranch to the remotemasterbranch. - The command you can enter in the terminal instead of using the
RStudiointerface topushyour commit to theremote.
Head over to your remote GitHub repo and take note of the following:
- The newly added files.
- The
commitmessage. - You are currently viewing the contents of the
masterbranch. - The unique ID of the
commit. A Git commit ID is a SHA-1 hash of every important thing about the commit. Clicking on it will allow you to see the difference (akadiff) between the previous commit. - The number of commits (aka snapshots of the repo).
- The number of branches.
3.7 Step 5: Open the repo with GitKraken
Link your GitHub account to GitKraken; you will be prompted for this when opening the GitKraken application for the first time. Open the local repo created in Step 2:
The following screenshot shows the local repo in the GitKraken GUI. Note the following (which has similar attributes to the online GitHub repo):
- The newly added files.
- The
commitmessage. - You are currently viewing the contents of the
masterbranch. - The unique ID of the
commit. A Git commit ID is a SHA-1 hash of every important thing about the commit. Clicking on it will allow you to see the difference (akadiff) between the previous commit. - The branches available locally.
- The branches available on the remote.
3.8 Discussion
Hopefully you were able to successfully complete all the steps in this Section. The main takeaway is to be able to add, commit, and push your local commits to the remote repo. It’s completely normal if you still have very little understanding of what just happened. I will clarify during the workshop. The point was for you to take a first stab at using version control and come to the workshop as prepared as possible.
4 Quick Start
This section runs through the development of a small toy package. It’s meant to illustrate the most important components of an R package. We will then provide a detailed treatment of the key components in the next sections.
4.1 Package Structure
A package is a convention for organizing files into directories. Figure 4.1 shows the 7 most common parts of
an R package.
Figure 4.1: source: https://rawgit.com/rstudio/cheatsheets/master/package-development.pdf
DESCRIPTIONfile (required): contains key metadata for the package that is used by repositories like CRAN and byRitself. This file contains the package name, the version number, the author and maintainer contact information, the license information, as well as any dependencies on other packages.NAMESPACEfile (required): specifies the interface to the package that is presented to the user. This is done via a series ofexport()statements, which indicate which functions in the package are exported to the user. Functions that are not exported cannot be called directly by the user (or they must use:::). In addition to exports, the NAMESPACE file also specifies what functions or packages are imported by the package. If your package depends on functions from another package, you must import them via the NAMESPACE file.Rsub-directory (required): The R sub-directory contains all of your R code, either in a single file, or in multiple files. For larger packages it’s usually best to split code up into multiple files that logically group functions together. The names of the R code files do not matter, but generally it’s not a good idea to have spaces in the file names.mansub-directory (required): contains the documentation files for all of the exported objects of a package. Theroxygen2package allows you to write the documentation directly into theRcode files. Therefore, you will likely have little interaction with themandirectory as all of the files in there will be auto-generated by theroxygen2package from theRcode files.testssub-directory: to store tests that will alert you if your code breaks.vignettessub-directory: holds documents that teach your users how to solve real problems with your tools.datasub-directory: allows you to include data with your package.
4.2 Create Required Files and Folders
We will rely heavily on the usethis package (Wickham and Bryan 2018) to create the required files and folders for us. Ensure that your working directory is set to the root of the GitHub repo you created in Section 3. The following shows a plain text listing of the directory:
-- .gitignore
-- rpkgs.Rproj
-- .gitignore
-- DESCRIPTION
-- NAMESPACE
-- R
|__rpkgs-package.R
-- rpkgs.Rproj
4.3 Documentation
| Tag | Meaning |
|---|---|
| @return | A description of the object returned by the function |
| @parameter | Explanation of a function parameter |
| @inheritParams | Name of a function from which to get parameter definitions |
| @examples | Example code showing how to use the function |
| @details | Add more details on how the function works (for example, specifics of the algorithm being used) |
| @note | Add notes on the function or its use |
| @source | Add any details on the source of the code or ideas for the function |
| @references | Add any references relevant to the function |
| @importFrom | Import a function from another package to use in this function (this is especially useful for inline functions like %>% and %within%) |
| @export | Export the function, so users will have direct access to it when they load the package |
| Tag | Meaning |
|---|---|
| \code{} | Format in a typeface to look like code |
| \dontrun{} | Use with examples, to avoid running the example code during package builds and testing |
| \link{} | Link to another R function |
| \eqn{}{} | Include an inline equation |
| \deqn{}{} | Include a display equation (i.e., shown on its own line) |
| \itemize{} | Create an itemized list |
| \url{} | Include a web link |
| \href{}{} | Include a web link |
5 Resources
References
Bryan, Jenny, STAT545TAs, and Jim Hester. 2019. Happy Git and Github for the useR. https://happygitwithr.com/.
Peng, Roger, Sean Kross, and Brooke Anderson. 2017. Mastering Software Development in R. https://bookdown.org/rdpeng/RProgDA/.
Wickham, Hadley, and Jennifer Bryan. 2018. Usethis: Automate Package and Project Setup. https://CRAN.R-project.org/package=usethis.